Correia Consultant Agency
Introduction
Starbucks’ gargantuan $26 billion of revenue in 2019 is resounding evidence that Americans, above everything, love their coffee. They prove it time and time again, most recently during the current pandemic: some, unable to relinquish their daily ‘bucks, waited hours on end in long drive-thru lines to get their fix when coffee shops were closed for in-person operations.
It is no surprise, then, that Starbucks owns more than 8,000 stores across the US, and continues to grow everyday. These sheer numbers are reflective of a company that is about more than just coffee, having a significant influence on US society and culture with (often controversial) initiatives such as removing religious references from holiday-themed cups. In addition, Starbucks is known for treating their employees extremely well by providing health coverage, tuition coverage, and 401(k) plans, which again reinforces their highly-regarded brand.
As such, the astounding popularity of the chain in the country and the prospect of further expansion raises important questions. What can the current distribution of Starbucks stores tell us about societal factors across the country? Which factors should the brand consider when expanding into new locations? How can companies like Starbucks, which pride themselves in a positive social and environmental outlook, incorporate such values into their corporate strategy - especially in the current pandemic?
Overview of Analysis
In order to answer these questions, we sought to understand the relationship between state-level social and economic factors, such as income and unemployment; and the distribution of Starbucks locations in the US. Furthermore, we wanted to understand what company behavior and decision-making could look like with a more social-minded approach, instead of considering sheer profits. Starbucks shops, after all, can be significant contributors to social good on a localized level by providing stable jobs that open up avenues for growth and education.
As such, in this project, we conduct a consulting case-study of Starbucks, analysing what insights are gained from the current distribution of stores in the US and providing socially-minded recommendations for new locations in which the brand can expand. We start this endeavor with a spatial analysis of Starbucks stores across the US, exploring the relationship between store density and income/unemployment level in each state. We then complement this analysis with a text-based sentiment analysis of recent tweets mentioning Starbucks, and explore how these sentiments vary across location. Finally, we use these variables to conduct a clustering analysis of states in the US, in order to determine the different “types” of states under which Starbucks operates. We proceed to use these clusters to identify the ideal locations for the brand to build new stores.
Data
In order to conduct this analysis, we collected the following data:
- The Starbucks Locations Worldwide dataset available on Kaggle and provided by Starbucks Corporation, including a record for every Starbucks or subsidiary store location that was operation during February 2017.
- The US Household Income Statistics dataset, available on Kaggle and provided by the Golden Oak Research group, containing information on US cities’ average income.
- The US Unemployment Rate by County dataset avaiable on Kaggle and provided by the US Department of Labor’s Bureau of Labor Statistics, containing unemployment and population data at the US county level from 1990-2016.
- The most recent 18,000 tweets containing the string “Starbucks” or “starbucks” in their text, scraped using the RTweet package to interface with the Twitter API.
Check the links in the bullets above for access to each of the data sources!
Distribution of Starbucks Across the US
TEXT EXPLAINING INTRODUCTION VISUAL OF STARBUCKS LOCATIONS Text explaining map, transition from current clusters of starbucks to clusters of potential locations.
General Process Towards Selecting New Location: 1. Find States 2. Find Counties
Clustering by State
Paragraph about how we went about determining clusters of states, and explaining why we want to use clusters and explaining factors we use
State-Level Factors
- Consumer Sentiments
- Average Income
- Average Unemployment Rate
- Number of Starbucks
Cosumer Sentiments Across US
Before including the consumer sentiments in the clustering analysis, it was important to understand this information independently. Our initial hypothesis was that states with larger densities of starbucks would tweet more positive and less negative opinions. However, when observing the plots below, states with more Starbucks, such as California, Texas, Florida, and New York, all have a lower proportion of positive words in tweets and high proportion of negative words in tweets compared to other states in the US. Additionally, the states with a lower amount of Starbucks, such as Wisconsin, Iowa, Kansas, and Mississippi, had a higher proportion of positive words and lower proportion of negative words in tweets. This leads to the conclusion that it may be more beneficial for Starbucks to build its stores in states it is not currently heavily established in, according to consumer sentiment analysis.
State-Level Clusters
Explain cluster analysis and elbow plot helped to determine 5 clusters were needed
Visualization of Cluster
So we chose 5 clusters based on this plot, and the 5 clusters are shown on this visualization:
Cluster Characteristics
Explain we can see characteristics of each cluster through this matrix. Explain each cluster’s unique properties
Final List of States
Then, we calculated the center within each center, finding the states within each cluster that best represented each cluster by finding the state with the miminal distance between each characterstic’s value and itself. From this analysis we concluded the final 5 states, each representative of their own cluster, are:
- Virginia
- Vermont
- Tennessee
- Iowa
- California
Clustering by County
County-Level Factors
- Average Income
- Average Unemployment Rate
- Number of Starbucks
Step-by Step Process
- Cluster Analysis
- Creating clusters based on appropriate number of centers
- Determine the most suitable cluster of counties
Determining Amount of Clusters for Each State
Virginia
Cluster Analysis
Cluster characteristics, explain why we chose cluster 4
Vermont
Cluster Analysis
Cluster characteristics, explain why we chose cluster 1
Tennessee
Cluster Analysis
Visualizing clusters
Cluster characteristics, explain why we chose cluster 4
Iowa
Cluster Analysis
Visualizing clusters
Cluster characteristics, explain why we chose cluster 5
California
Cluster Analysis
Visualizing clusters
Cluster characteristics, explain why we chose cluster 1
Final Recommendations
Summarize overall process and thinking and impact of starbucks. Show the data table of all important counties from clusters we chose:
| County | Avg. Income | Num. Starbucks | Avg. Unemployment Rate |
|---|---|---|---|
| Fresno | 55523.57 | 36 | 12.326804 |
| Glenn | 53148.00 | 0 | 11.998625 |
| Kern | 49580.61 | 39 | 11.487972 |
| Madera | 49355.76 | 5 | 11.890378 |
| Merced | 49976.11 | 7 | 13.486942 |
| Modoc | 43340.33 | 0 | 10.394502 |
| Monterey | 68913.00 | 1 | 9.826804 |
| Plumas | 64924.83 | 0 | 11.346392 |
| San Joaquin | 71281.69 | 24 | 10.629553 |
| Santa Cruz | 82669.83 | 7 | 11.217886 |
| Shasta | 53369.18 | 10 | 9.644330 |
| Stanislaus | 48900.86 | 7 | 11.919244 |
| Tehama | 48808.75 | 1 | 9.508935 |
| Tulare | 45336.69 | 17 | 13.573540 |
| Yuba | 55171.50 | 0 | 12.325086 |
| Adams | 65864.50 | 0 | 6.569547 |
| Appanoose | 49003.50 | 0 | 5.678395 |
| Benton | 57531.00 | 0 | 6.154527 |
| Buchanan | 63804.00 | 0 | 7.088080 |
| Butler | 61836.00 | 0 | 5.995722 |
| Calhoun | 43333.00 | 0 | 7.597668 |
| Carroll | 55611.00 | 0 | 5.938277 |
| Cherokee | 55162.00 | 0 | 6.286043 |
| Clarke | 43619.50 | 0 | 7.021777 |
| Clay | 46965.00 | 0 | 6.565042 |
| Clayton | 54832.50 | 0 | 5.518210 |
| Clinton | 59001.00 | 0 | 6.144743 |
| Crawford | 51775.50 | 0 | 6.761908 |
| Dallas | 78098.67 | 0 | 7.271241 |
| Delaware | 55472.17 | 0 | 5.291101 |
| Fayette | 55887.25 | 0 | 6.797021 |
| Franklin | 50078.00 | 0 | 5.943186 |
| Greene | 58107.83 | 0 | 6.827746 |
| Grundy | 64870.50 | 0 | 6.497376 |
| Hardin | 58296.00 | 0 | 6.925958 |
| Henry | 51271.00 | 0 | 6.525775 |
| Jackson | 47535.00 | 0 | 6.540742 |
| Jasper | 59793.11 | 1 | 6.459512 |
| Jefferson | 56939.00 | 0 | 6.596809 |
| Linn | 73450.25 | 9 | 7.200540 |
| Lyon | 54713.00 | 0 | 5.473609 |
| Marion | 47388.00 | 0 | 7.014973 |
| Marshall | 60222.00 | 0 | 6.233670 |
| Osceola | 49948.00 | 0 | 5.801299 |
| Pocahontas | 50181.50 | 0 | 6.703086 |
| Union | 43619.50 | 0 | 6.206117 |
| Warren | 60983.96 | 7 | 5.924982 |
| Washington | 49947.33 | 0 | 5.938779 |
| Webster | 54712.17 | 0 | 6.443786 |
| Chittenden | 83283.50 | 0 | 3.387654 |
| Brunswick | 35873.00 | 0 | 7.469872 |
| Cumberland | 46083.00 | 0 | 6.302316 |
| Grayson | 32076.00 | 0 | 7.063253 |
| Greensville | 33192.67 | 0 | 5.819667 |
| Halifax | 29411.00 | 0 | 8.662019 |
| Lee | 31678.50 | 0 | 7.173886 |
| Northampton | 39718.25 | 0 | 6.739979 |
| Prince Edward | 54858.00 | 0 | 6.603667 |
| Scott | 46810.00 | 0 | 5.713129 |
| Smyth | 39226.00 | 0 | 8.147333 |
| Carter | 41428.75 | 0 | 6.840901 |
| Chester | 24221.00 | 0 | 7.115124 |
| Fentress | 39775.00 | 0 | 9.083333 |
| Hardeman | 26503.00 | 0 | 7.497459 |
| Lawrence | 45854.50 | 0 | 7.462405 |
| Lewis | 41536.00 | 0 | 8.101558 |
| Macon | 40533.00 | 0 | 7.106605 |
| Monroe | 48462.00 | 0 | 6.983337 |
| Obion | 45863.00 | 0 | 7.635802 |
| Rhea | 38409.50 | 0 | 8.395988 |
| Unicoi | 47091.25 | 0 | 8.077778 |
| Van Buren | 41919.00 | 0 | 7.148339 |
| Weakley | 40819.00 | 0 | 6.972840 |
| County | Avg. Income | Num. Starbucks | Avg. Unemployment Rate |
|---|---|---|---|
| Fresno | 55523.57 | 36 | 12.326804 |
| Glenn | 53148.00 | 0 | 11.998625 |
| Kern | 49580.61 | 39 | 11.487972 |
| Madera | 49355.76 | 5 | 11.890378 |
| Merced | 49976.11 | 7 | 13.486942 |
| Modoc | 43340.33 | 0 | 10.394502 |
| Monterey | 68913.00 | 1 | 9.826804 |
| Plumas | 64924.83 | 0 | 11.346392 |
| San Joaquin | 71281.69 | 24 | 10.629553 |
| Santa Cruz | 82669.83 | 7 | 11.217886 |
| Shasta | 53369.18 | 10 | 9.644330 |
| Stanislaus | 48900.86 | 7 | 11.919244 |
| Tehama | 48808.75 | 1 | 9.508935 |
| Tulare | 45336.69 | 17 | 13.573540 |
| Yuba | 55171.50 | 0 | 12.325086 |
| Adams | 65864.50 | 0 | 6.569547 |
| Appanoose | 49003.50 | 0 | 5.678395 |
| Benton | 57531.00 | 0 | 6.154527 |
| Buchanan | 63804.00 | 0 | 7.088080 |
| Butler | 61836.00 | 0 | 5.995722 |
| Calhoun | 43333.00 | 0 | 7.597668 |
| Carroll | 55611.00 | 0 | 5.938277 |
| Cherokee | 55162.00 | 0 | 6.286043 |
| Clarke | 43619.50 | 0 | 7.021777 |
| Clay | 46965.00 | 0 | 6.565042 |
| Clayton | 54832.50 | 0 | 5.518210 |
| Clinton | 59001.00 | 0 | 6.144743 |
| Crawford | 51775.50 | 0 | 6.761908 |
| Dallas | 78098.67 | 0 | 7.271241 |
| Delaware | 55472.17 | 0 | 5.291101 |
| Fayette | 55887.25 | 0 | 6.797021 |
| Franklin | 50078.00 | 0 | 5.943186 |
| Greene | 58107.83 | 0 | 6.827746 |
| Grundy | 64870.50 | 0 | 6.497376 |
| Hardin | 58296.00 | 0 | 6.925958 |
| Henry | 51271.00 | 0 | 6.525775 |
| Jackson | 47535.00 | 0 | 6.540742 |
| Jasper | 59793.11 | 1 | 6.459512 |
| Jefferson | 56939.00 | 0 | 6.596809 |
| Linn | 73450.25 | 9 | 7.200540 |
| Lyon | 54713.00 | 0 | 5.473609 |
| Marion | 47388.00 | 0 | 7.014973 |
| Marshall | 60222.00 | 0 | 6.233670 |
| Osceola | 49948.00 | 0 | 5.801299 |
| Pocahontas | 50181.50 | 0 | 6.703086 |
| Union | 43619.50 | 0 | 6.206117 |
| Warren | 60983.96 | 7 | 5.924982 |
| Washington | 49947.33 | 0 | 5.938779 |
| Webster | 54712.17 | 0 | 6.443786 |
| Chittenden | 83283.50 | 0 | 3.387654 |
| Brunswick | 35873.00 | 0 | 7.469872 |
| Cumberland | 46083.00 | 0 | 6.302316 |
| Grayson | 32076.00 | 0 | 7.063253 |
| Greensville | 33192.67 | 0 | 5.819667 |
| Halifax | 29411.00 | 0 | 8.662019 |
| Lee | 31678.50 | 0 | 7.173886 |
| Northampton | 39718.25 | 0 | 6.739979 |
| Prince Edward | 54858.00 | 0 | 6.603667 |
| Scott | 46810.00 | 0 | 5.713129 |
| Smyth | 39226.00 | 0 | 8.147333 |
| Carter | 41428.75 | 0 | 6.840901 |
| Chester | 24221.00 | 0 | 7.115124 |
| Fentress | 39775.00 | 0 | 9.083333 |
| Hardeman | 26503.00 | 0 | 7.497459 |
| Lawrence | 45854.50 | 0 | 7.462405 |
| Lewis | 41536.00 | 0 | 8.101558 |
| Macon | 40533.00 | 0 | 7.106605 |
| Monroe | 48462.00 | 0 | 6.983337 |
| Obion | 45863.00 | 0 | 7.635802 |
| Rhea | 38409.50 | 0 | 8.395988 |
| Unicoi | 47091.25 | 0 | 8.077778 |
| Van Buren | 41919.00 | 0 | 7.148339 |
| Weakley | 40819.00 | 0 | 6.972840 |
## # A tibble: 73 x 4
## County `Avg. Income` `Num. Starbucks` `Avg. Unemployment Rate`
## <chr> <dbl> <dbl> <dbl>
## 1 "Fresno " 55524. 36 12.3
## 2 "Glenn " 53148 0 12.0
## 3 "Kern " 49581. 39 11.5
## 4 "Madera " 49356. 5 11.9
## 5 "Merced " 49976. 7 13.5
## 6 "Modoc " 43340. 0 10.4
## 7 "Monterey " 68913 1 9.83
## 8 "Plumas " 64925. 0 11.3
## 9 "San Joaquin " 71282. 24 10.6
## 10 "Santa Cruz " 82670. 7 11.2
## # … with 63 more rows
Limitations and Conclusion
Paragraph here
Citations
List of important packages/data sets